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Abstract:  Discrete  elliptic  problems  can  often  naturally  be  partitioned  into  subproblems  correspond- 
ing to  subregions  into  which  the  region  has  been  partitioned  or  from  which  it  was  originally  assem- 
bled. Fast  conjugate  gradient  methods  have  been  developed  to  handle  the  continuity  conditions,  across 
the  interfaces,  that  need  to  be  satisfied  by  the  solution  of  the  entire  problem. 

In  this  paper,  we  focus  on  a  special  family  of  problems  that  arises  in  combustion  engine  model- 
ing. We  demonstrate  that  efficient  algorithms  can  conveniently  be  built  using  standard  software  to 
solve  elliptic  problems  on  simple  regions.  We  give  a  convergence  analysis  which  is  much  simpler 
than  the  general  theory  for  domain  decomposition  algorithms.  We  also  show  that  the  use  of  multigrid 
cycles  to  solve  the  subproblems  inexactly  can  be  very  efficient. 


A  Domain  Decomposition  Laplace  Solver  for  Internal  Combustion  Engine  Modeling 

1.  Introduction 

The  interest  in  domain  decomposition  methods  has  increased  considerably  in  recent  years,  to  a 
large  extent  because  of  their  promise  on  parallel  computers.  There  are  frequent  conferences;  cf. 
Glowinski  and  Periaux  (1987).  Much  has  been  learned  on  how  to  design  optimal  or  almost  optimal 
iterative  algorithms  to  handle  the  interaction  between  solvers  on  the  subregions.  In  this  paper,  we  will 
examine  the  use  of  one  of  these  methods  for  a  particular  problem  arising  in  combustion  engine  model- 
ing. We  note,  however,  that  the  ideas  explored  here  can  be  adopted  to  other  classes  of  problems  in 
fluid  dynamics  etc..  By  focusing  on  a  quite  specific  problem,  we  are  able  to  demonstrate  that  the 
theory  sometimes  can  be  simplified,  that  the  algorithms  can  be  speeded  up  by  using  special  features, 
and  that  standard  software  can  be  used  to  assemble  a  solver  on  relatively  simple  but  non-trivial 
regions. 

We  describe  a  Laplace  solver  on  a  domain  Q  of  the  shape  shown  in  Fig.  1.  This  solver  will  be 
used  10  compute  the  potential  flow  component  in  a  random  vortex  method  applied  to  a  problem  with  a 
moving  piston;  see  Sethian  (1987).  Related  work  is  described,  e.g.,  in  Sethian  (1984),  Sethian  and 
Ghoniem(1986). 

In  Fig.  1,  the  large  rectangle  fi^^'  represents  a  cylinder,  and  the  small  attached  quadrilaterals, 
the  union  of  which  we  denote  by  ii*",  represent  so-called  inlets.  There  are  no  inlets  attached  to  the 
lower  side  of  fi^'.  The  number  of  inlets  attached  to  the  left,  right  and  upper  sides  of  £i*^\  their  posi- 
tions, and  the  angles  which  they  form  with  ii^'  are  left  variable  in  our  program.  Sethian's  calcula- 
tions are  time  dependent  with  the  lower  side  of  Q'^'  moving.  It  is  a  fractional  step  method,  and  one  of 
the  steps  requires  the  solution  of  a  Neumann  problem  of  the  fonn 

-A(|)  =  Oonn  (1.1) 

|^  =  ^onaa     '  (1.2) 

an 
Here  0  is  a  velocity  potential.  The  Neumann  data  g  and  the  position  of  the  lower  side  of  Q*^', 

representing  the  top  of  the  piston,  vary  with  time.  The  lowest  inlet  always  remains  above  the  lower 
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Equations  (1.1)  and  (1.2)  are  discretized  with  piecewise  linear  finite  elements  on  a  quasi- 
uniform  mesh  as  described  in  Section  2.  The  region  has  re-entrant  comers,  and  this  is  known  to  limit 
the  regularity  of  the  solution,  cf.  Grisvard  (1985).  As  a  consequence,  the  discretization  will  yield  only 
a  crude  approximation  to  0,  unless  the  mesh  width  is  very  small.  On  the  other  hand,  the  resulting  sys- 
tem of  linear  equations  has  a  relatively  simple  structure  and  can  be  solved  very  efficiently.  We  will 
focus  on  this  case  in  this  paper,  but  discuss  possible  remedies  through  local  refinement  of  the  mesh  in 
the  last  section.  One  of  the  domain  decomposition  methods  studied  by  Bj^rstad  and  Widlund 
(1984,1986),  the  so-called  "excellent"  or  Neumann-  Dirichlet  algorithm,  will  be  used.  A  careful 
motivation  of  this  choice  is  given  in  those  papers.  For  the  problem  at  hand,  we  use  a  fast  Poisson 
solver  for  the  rectangular  region  and  band  Cholesky  for  the  relatively  narrow  inlets.  The  general  con- 
vergence analysis  of  this  method  requires  tools  firom  the  theory  of  elliptic  boundary  value  problems; 
see  Bj^rstad  and  Widlund  (1986),  Widlund  (1987b).  However,  for  the  geometry  and  meshes  under 
consideration,  a  much  simpler  convergence  analysis,  using  elementary  mappings  and  reflections,  can 
be  given.  This  results  in  quantative  and  realistic  bounds;  see  Section  4,  where  results  of  numerical 
experiments  are  also  presented. 

Recently  there  has  been  an  increased  interest  in  the  use  of  inexact  solvers  for  the  problems 
defined  on  the  subregions,  see  Bramble,  Pasciak  and  Schatz  (1986)  and  Widlund  (1987a).  In  the  fifth 
section,  we  demonstrate  that  the  exact  solver  for  Q°^  can  be  replaced  by  a  multigrid  cycle.  The 
number  of  outer  iterations  grows  modestly,  but  the  cost  per  step  is  considerably  decreased.  We  also 
give  bounds  for  the  rate  of  convergence  of  this  variant  of  the  algorithm.  In  the  final  section,  we  collect 
a  number  of  ideas  that  could  considerably  further  enhance  the  performance  of  the  iterative  method  as 
well  as  the  accuracy  of  the  discrelizalior.. 

The  code  used  in  the  numerical  experiments  is  available  from  the  authors. 
2.  Triangulation  of  the  region 
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The  region  is  the  union  of  a  rectangle  ii*^'  and  N,^u  small  quadrilaterals  called  inlets.  Q'^'  is 
covered  by  a  regular,  rectangular  mesh,  with  mesh  widths  h^  in  the  x-direciion  and  hy  in  the  y- 
direciion.  We  use  logically  rectangular  meshes  on  the  inlets,  see  Fig.  2,  with  common  mesh  points  on 
the  interface  between  any  mlet  and  the  rectangle. 

The  triangulaiion  of  the  rectangle  is  obtained  by  cutting  each  cell  of  the  rectangular  mesh  into 
two  triangles.  We  do  not  need  to  specify  the  direction  of  these  cuts,  since  the  same  five  point  formula 
results  when  we  use  piecewise  linear  finite  elements. 

We  triangulate  an  inlet  by  defining  a  bilinear  mapping  y,  from  a  rectangle  R,  onto  the  inlet,  /,, 
as  indicated  in  Fig.  2.  On  the  rectangle  R,  a  regular,  rectangular  mesh  is  defined.  While  the  mesh 
width  in  one  coordinate  direction  is  determined  by  the  mesh  on  the  large  rectangle,  the  other  can  be 
chosen  freely.  We  triangulate  R,  by  dividing  each  of  its  mesh  cells  into  two  triangles.  This  triangula- 
tion  induces  a  thangulation  of  the  inlet  as  shown  in  Fig.  2. 

3.  A  preconditioned  conjugate  gradient  method 

Let  Ae/?*"^  be  a  symmetric,  positive  semidefiniie  matrix,  and  let  beR"  be  a  vector  orthogonal 
to  the  kernel  of  A.  Let  AeR"*'  be  symmetric  and  positive  semidefiniie,  with  ker(A)c  ker(i4).  There 
are  several  possible  preconditioned  conjugate  gradient  algorithms  for 

Au  =  b  (3.1) 

using  the  preconditioner  A,  which  are  mathematically  equivalent  but  algorithmically  different  The 
version  given  below  is  particularly  useful  for  domain  decomposition  methods.  The  matrix  A  only 
appears  in  a  vector-matrix  multiplication  with  A-A.  In  our  application,  this  is  much  cheaper  than  mul- 
tiplying with  A.  In  addition,  we  need  a  procedure  which  gives  us  the  product  of  the  Moore-Penrose 
pseudo-inverse  A    of  A  with  an  arbitrary  vector. 

Preconditioned  conjugate  gradient  algorithm: 

Letuf°':-g€/?". 


-4  - 
Replace  ^'"^  by  its  orthogonal  projection  onto  the  range  of -4. 

^(0).=  ^(0) 
-(0)  -■>■     (0) 

g      ■=A   g^"> 


-(0)  .(0) 

d      ■.=  g 


For /=0, 1,2....: 

aO'  :=  [,0)^|C.) 


^^''((A-A)d^'+d(^':J 


(/) 


^0*1)  :=^0)_aO)[(^_^)j^\^0)] 


Replace  ^  '^'^    by  its  orthogonal  projection  onto  the  range  of  A. 


pO).= 


^(^••■"^oO+l) 


i^^'^^^ 


^0*1)  :=gO-i)+|30)^ 


0) 


-0*1        -0*1)    om^O) 

The  vectors  u^'  converge  to  a  solution  of  (3.1). 

The  orthogonal  projections  onto  the  range  of  A  have  no  effect  in  exact  arithmetic,  but  these 
steps  are  sometimes  necessary  to  obtain  convergence  in  floating  point  arithmetic. 

4.  The  domain  decomposition  algorithm 

In  this  section,  we  describe  the  Neumann-Dirichlet  domain  decomposition  method  as  given  in 
Bj^rstad  and  Widlund  (1984,1986).  Because  of  the  simple  structure  of  our  problem,  we  can  give  a 
simplified  convergence  analysis.  We  also  present  numerical  results. 


4.1  Notation  and  preliminary  remarks 

The  finite  element  approximation  of  eqs.  (1.1),  (1.2)  can  be  written  as  a  system  of  linear  equa- 
tions of  the  form 


-5 


Kx  =  b, 


(4.1) 


with  K,  X  and  b  partitioned  as  follows: 


K 


Ku     0     K,l 

f       - 

'^1 

0      a:  22    ^^23 

,    X  = 

i2 

.    ^  = 

^2 

Ki-i  K2i  AT 33 

£3 

^3 

(4.2) 


Here  the  subscript  1  refers  to  the  nodes  of  the  inlets  not  belonging  lo  the  interfaces  between  the  inlets 
and  the  rectangle,  the  subscript  2  to  the  nodes  in  the  rectangle  not  belonging  to  the  interfaces,  and  the 
subscript  3  the  nodes  on  the  interfaces. 


The  entries  of  the  stiffness  matrix  K  are  integrals  of  the  form 


JV4)Vydi. 


(4.3) 


where  <Sf  and  \f  are  canonical  basis  functions  of  the  space  of  piecewise  linear  finite  elements.  These 
integrals  can  be  written  in  the  form 


We  obtain  a  corresponding  splitting  A'=/f*"+Af*^',  where 


(4.4) 


and 


and  thus 


We  note  that  a  system  of  the  form 


a:")  = 


^11 

OK,l 

0 

0    0 

K\^ 

0  K^ 

K^^  = 


0    0       0 

0   K22    ^23 

0  Kl,  K^^ 


K,,=K'^^+Kfl 


(4.5) 


(4.6) 


(4.7) 


K2I    ^23 

£2 

.^3 

= 

'^2 
Ml 

(4.8) 


is  a  discrete  Neumann  problem  on  Q.'-^K 
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The  following  matrix,  the  so-called  Schur  complement,  plays  a  central  role  below: 

S  :=  Kj^  -  Ki-iK'^^K'^'j  -  K 23^22.1^ 23-  (•^•9) 

Since  S  can  be  shown  to  be  a  diagonal  block  in  a  block  Cholesky  factorisation  of  K,  it  is  symmetric 

and  positive  semidefinite,  and  its  kernel  is  spanO),  the  set  of  constant  vectors.  The  two  subregions 

each  contribute 

S^^:=K'^^-KjyfC;}K,:i,  j=\,2  (4.10) 

to  5.  S*'^  and  5^'  are  symmetric  and  positive  semidefinite  with  ker(S'^^)=span(l),  and 

5  =  S("+S®.  (4.11) 

The  linear  system  (4.8)  is  easy  to  solve,  using  a  standard  fast  Laplace  solvers  for  rectangular 
domains.  It  is  easy  to  see  that  the  solution  X3  of 

S<^£3  =  r3  (4.12) 

is  a  subvector  of  the  solution  of 


K22    ^^23 

.Kh  m  [x,\ 

Therefore  eq.  (4.12)  can  be  solved  with  a  fast  Poisson  solver.  The  application  of 


0 

l3| 


(4.13) 


S^'^  =  KS'^-K\,K-,lKn  (4-14) 

to  a  vector  y-^  is  also  easy  and  inexpensive.  The  main  wwk  required  is  the  solution  of  the  system 

Kuyj=K,,yy  (4.15) 

We  use  band  Gaussian  elimination  for  this  purpose,  observing  that  K^  is  block  diagonal  with  A^uj^ 
blocks.  To  reduce  the  band  width  of  a  block  of  ^f  u .  we  order  the  unknowns  as  indicated  in  Fig.  3.  We 
denote  the  number  of  nodes  on  the  interface  by  m,. 

From  the  point  of  view  of  band  width,  it  does  not  matter  whether  the  nodes  which  are  closest  to 
the  rectangle  Q'^^  are  numbered  first  or  last.  However,  we  order  them  last  for  the  following  reason. 
During  the  iteration,  the  right-hand  side  in  (4.15)  is  nonzero  only  in  the  components  corresponding  to 
the  mesh  layer  in  the  inlets  closest  to  Q*''',  and  we  only  need  the  same  components  of  £1  to  form 
5"  V3.  If  we  order  these  nodes  last,  most  of  the  work  in  the  solution  phase  of  the  Cholesky  algorithm 
becomes  unnecessary.  In  fact,  only  the  bottom  right  m,  by  m,  submatrices  of  the  triangular  factors  of 
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ihe  stiffness  matrices  for  the  inlets  need  to  be  saved.  The  amount  of  work  per  iteration  on  the  i  -th 
inlet  will  therefore  be  proportional  to  mj,  once  the  stiffness  matrices  corresponding  to  the  inlets  have 
been  factored.  Since  /^n  is  unaffected  by  the  position  of  the  bottom  of  Q^',  which  changes  with 
time,  this  matrix  is  factored  only  once. 


4.2  Statement  of  the  algorithm 


When  solving  (4. 1),  we  first  reduce  the  right-hand  side  ^  to  the  form 


0 
D 

P3| 


This  is  accomplished  by  first  solving  a  problem  with  the  matrix 


"22    "^  23 


(4.16) 


(4.17) 


i.e.  a  discrete  Neumann  problem,  and  then  a  problem  with  the  matrix  ATn  ■  A  problem  of  the  form 


Ku     0    ^,3 

y_\ 

'o 

0    K:a  K-a 

yj. 

= 

0 

^13    ^n    ^^33 

y_i 

£3 

(4.18) 


is  equivalent  to 

5^3  =  P3.  (4.19) 

which  is  solved  by  using  the  preconditioned  conjugate  gradient  method,  with  the  preconditioner  5^^'. 

Once  the  solution  y^  of  (4.19)  is  known,  the  complete  solution  of  (4.18)  is  obtained  by  solving 


t^\\y\  =-^i3>'3 


for  y  1 ,  and  then 


A  22    ^23 


'J      \.'     J 


y_i 
y\ 


Pi-Klsy  i-K^h: 


(4.20) 


(4.21) 


loryz- 


This  concludes  the  description  of  the  algorithm. 

We  note  that  we  never  need  lo  compute  the  elements  of  S  and  S'^K  The  straightforward  compu- 
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laiion  of  S  would  require  the  solution  of  p  linear  systems  with  the  matrix  /^n ,  and  the  same  number  of 
systems  with  the  matrix  A" 22,  where  p  is  the  total  number  of  nodes  on  the  interfaces  between  Q"'  and 
q(2'.  Linear  systems  with  K22  are  not  as  easy  to  solve  as  those  with  the  matrix  (4.17),  since  the  matrix 
K22  represents  a  discretized  boundary  value  problems  with  Dirichlet  conditions  along  the  interfaces, 
and  Neumann  conditions  elsewhere.  Such  a  system  is,  in  general,  non-separable  and  fast  direct  Pois- 
son  solvers  are  not  available. 

If  the  elements  of  5*^'  were  required,  they  could  also  be  computed  by  first  computing  (5*^')'', 
by  using  a  fast  Poisson  solver />  times;  cf.  eqs.  (4.12)  and  (4.13).  Thereafter(5'^*)"'  is  inverted.  As  we 
have  noted,  these  computations  are  unnecessary  for  our  algorithm. 


43  Convergence  analysis 

From  the  theory  of  the  conjugate  gradient  method,  it  is  known  that  the  number  of  steps  needed 
to  reach  a  prescribed  accuracy  is  bounded  above  by  a  function  of 


max- 


mm 


£3-S£3 


(4.22) 


where  the  maximum  and  minimum  are  taken  over  the  orthogonal  complement  of  1^;  see  e.g.  Luen- 
berger  (1969),  p.296.  This  function  grows  linearly  with  the  square  root  of  k  for  large  k. 

The  condition  number  k  is  estimated  in  two  steps;  cf.  e.g.  Bj^rstad  and  Widlund  (1986).  In  the 
first  step,  given  in  Lemma  1 ,  only  linear  algebra  is  needed.  The  second  step,  given  in  Lemma  2,  is  in 
general  more  difficult  and  requires  tools  from  analysis;  see  Widlund  (1987a).  However,  in  the  present 
case,  this  argument  can  be  very  much  simplified. 


Lemma  1:  Let  C  be  the  smallest  constant  such  that,  for  all 


Xo 

-    ,  there  is  an  xi  such  that 

i3 


''IT 
^1 


i2 
£3 


Kn     0     A:,3 

^1 

f     >  T  r ..        ..    ^    r     > 

0      ^^22    ^23 
^13    ^^23    ^33 

^2 
£3 

<c- 

£2 
.£3 

A  22    1^2-i 

£2 
.£3 

(4.23) 
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Then 


K  <  C.  (4.24) 

This  estimate  is  not  necessarily  sharp. 

Proof:  We  first  note  that  the  denominator  in  (4.22)  is  always  >1.  Therefore 

xlSx-i 
<^max   -    -     .  (4.25) 

x^S'-^'x-i 

(4.25)  is  not  necessarily  sharp.  Consider,  e.g.,  a  case  where  Q  is  a  rectangle,  divided  into  two  halves. 
Then,  by  symmetry,  5'"  =  5^\  and  therefore  the  right-hand  side  in  (4.25)  becomes  2,  but  k^I. 

The  right-hand  side  in  (4.25)  is  bounded  by  the  constant  C  referred  to  in  Lemma  1.  To  see  this, 
note  that,  from  the  definitions  of  S  and  5  ^\ 


max 


xlSX2 


£3 


max- 


^n     0 

^^13 

0    K22 

K2i 

^^13    ^^23 

K^^ 

-A!~J'i'^i3£3 

-Af22'^23£3 
£3 


—K22K  23X: 
£3 


K22    ^23 


—K22.K-22X'i 
£3 


(4.26) 


The  proof  of  Lemma  1  is  completed  by  showing,  by  a  straightforward  computation,  that 


"'mi  ^13^3 
— Af22^23£ 

£3 


A^u 

0 

Kn 

0 

^22 

K^ 

'^[3 

Kh 

^33 

L 

-IC{iKxiX-i 

—IC22K  2iX-i 

£3 


—KxiK  2iX^ 
£3 


^11 

0 

^13 

0 

A^22 

^23 

/^[3 

Kh 

^33 

il 

-^^22^23^3 
£3 


(4.27) 


for  all  vectors  X]  andx3.  D 

We  note  that  the  right-hand  side  in  (4.25)  is  exactly  equal  to  the  constant  C. 

For  Lemma  2,  the  following  notation  is  needed.  /,  is  the  i-th  inlet,  and  R,  the  corresponding 
rectangle;  cf.  Section  2  and  Fig.  2.  Recall  that  the  mesh  on  /,  is  the  image  of  a  regular  rectangular 
mesh  on  R,.  Let  h,  be  the  mesh  width  in  /?,  in  the  direction  normal  to  the  interface. 
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Lemma  2:  Assume  that  the  mesh  ratios  -r-  and  -7^  are  uniformly  bounded.  Then  the  constant  C  of 

h,  h, 

Lemma  1  is  uniformly  bounded. 
Proof:  Let 


^2 


(4.28) 

be  a  vector  representing  a  finite  element  function  4>  on  Q'-^^  The  quadratic  form  on  the  right-hand 
side  of  (4.23)  is  the  Dirichlet  integral  of  <)).  Similarly,  the  left-hand  side  of  (4'23)  is  the  Dirichlet 
integral  of  a  finite  element  function  which  extends  <t)  to  all  of  ft.  To  estimate  C,  we  should  therefore 
study  the  extension  of  finite  element  functions  from  Ci^^'  to  Q.  A  general  extension  theorem  of  this 
kind  for  conforming  finite  elements  has  been  established  by  Widlund  (1987a). 

When  studying  the  extension  problem  for  the  current  geometry,  we  first  make  three  simplifying 
assumptions:  1)  The  inlets  are  rectangular.  2)  The  mesh  widths  in  the  inlets  are  the  same  as  those  of 
n*^'.  3)  The  mirror  images  of  the  inlets  are  completely  contained  in  ii^^*  and  do  not  overlap  each 
other;  see  Fig.  4. 

The  finite  element  function  ^  is  extended  into  the  inlets  as  an  even  function  with  respect  to  the 
direction  normal  to  the  interface.  The  Dirichlet  integral  over  Q'"  of  the  extended  function  is  bounded 
by  the  Dirichlet  integral  of  ^  over  Q^^\  and  thus 

C  <  2.  (4.29) 

We  note  that  this  argument  fails  if  (1.2)  were  a  Dirichlet  condition,  since  the  extended  function 
would  have  to  vanish  on  the  three  outer  sides  of  each  inlet. 

The  remainder  of  the  proof  consists  of  the  slep-by-siep  removal  of  the  assumptions  1-3.  We 
begin  with  assumption  3.  If  the  reflected  images  of  the  inlets  in  Q^'  overlap  each  other,  (4.29)  is 
replaced  by 

C<4,  (4.30) 

since  no  point  in  Q^^^  can  be  covered  by  more  than  three  reflected  images. 

If  one  of  the  inlets  is  so  long  that  its  reflected  image  is  not  entirely  contained  in  Q'^\  the 
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reflection  does  noi  define  the  extension  in  the  entire  inlet,  but  only  in  a  piece  of  it  However,  the 
extension  can  be  completed  by  repeated  reflection.  Let,  for  example,  the  left  side  of  Q'^'  be  the  line 
j:=0,  and  the  right  side  of  ti'^^  be  the  line  x=<j.  If  {x,y)  is  a  point  in  a  left  inlet,  with  -2a  <x<-a,  then 

^(x,)-)  :=  ^(-2fl-x,y),  (4.31) 

and 

^-2a-x,y)  :=  ^(2a+x,y).  (4.32) 

This  again  leads  to  a  larger,  but  mesh-independent  C. 

Suppose  now  that  assumption  2  is  violated,  e.g.  that  the  horizontal  mesh  width  /i  in  a  left  inlet 
differs  from  the  horizontal  mesh  width  /i,  in  il*^' .  The  extension  function  is  given  by 

^(-x,y):=^x~.y)-  (4.33) 

The  Dirichlet  integral  of  ^  can  easily  be  estimated  in  terms  of  the  Dirichlet  integral  of  ^.  If  /i,S/i;  for 
all  i,  the  previous  estimate  for  C  remains  unchanged.  Otherwise,  it  is  enlarged  by  a  factor  q,  where  q 

/ij  hy 

is  the  maximum  of  all  quotients  —  and  — .  This  factor  is,  by  the  assumption  of  Lemma  2,  bounded 

h,  hi 

independently  of  the  mesh. 

Assumption  1  can  be  eliminated  by  using  a  similar  argument  The  inlets  are  images  of  rectangu- 
lar regions,  and  the  corresponding  moping  functions  y,  naturally  enter.  The  estimate  for  C  is  then 
enlarged  by  the  factor 

max  max  p[det(Dv|/,(x))(Dv,(x))-'"(Dv,(x))-']  +  o(l),  (4.34) 

1  X 

where  p[— ]  denotes  the  spectral  radius,  and  y,  are  the  bilinear  mappings  used  in  the  mesh  construc- 
tion, see  Section  2.  This  follows  from  an  estimate  of  the  Dirichlet  integral  and  a  change  of  variables. 

n 

From  Lemmas  1  and  2,  we  obtain  the  convergence  theorem: 

Theorem:  Let  the  mesh  ratios  be  bounded  as  in  Lemma  2.  Then  the  condition  number  k  defined  in 
(4.22)  is  bounded  independently  of  the  mesh. 
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This  analysis  and  the  numerical  results  below  show  that  S^^'  is  an  excellent  preconditioner  for 
S. 

It  is  also  possible  to  use  S*"  as  a  preconditioner  for  S.  As  we  noticed  previously,  there  is  a  con- 
siderable difference  between  a  Neumann  and  a  Dirichlet  problem.  For  a  Dirichlet  problem,  S*''  would 
be  the  better  choice;  see  e.g.  Fig.  8  in  Anderson  (1985).  The  reason  is  that  in  the  Dirichlet  case,  the 
extension  of  certain  finite  element  functions  from  ti^^'  into  Q'"  will  introduce  large  gradients,  since 
we  have  to  impose  homogeneous  Dirichlet  conditions  on  the  outer  sides  of  each  inlet  This  results  in 
a  large  Dirichlet  integral  over  Q*".  On  the  other  hand,  the  extension  from  Q'"  to  Q^*  can  easily  be 
accomplished  in  the  Dirichlet  case  if  assumptions  1  to  3  are  satisfied.  We  use  reflections  and  then 
extend  the  resulting  function  by  zero  to  the  rest  of  Q*^^ ,  obtaining  an  extended  function  with  twice  as 
big  a  Dirichlet  integral. 

For  Neumann  problems,  we  are  not  able  to  establish  that  preconditioning  with  S*"  leads  to  a 
large  or  small  condition  number.  From  a  practical  point  of  view,  the  use  of  S'^^  is  clearly  preferable, 
since  using  S*''  as  a  preconditioner  would  make  it  necessary  to  solve  non-separable  problems  on  fi®. 

4.4  Numerical  results 

We  have  conducted  experiments  on  the  regions  shown  in  Figs.  1  and  4.  The  size  of  the  mesh, 
including  the  mesh  points  on  the  boundary,  is  33n  by  33^  in  the  rectangle  Q^^,  and  3m.  by  9|J.  in  each 
of  the  inlets,  with  |i=l,2,3  or  4.  Tables  I  and  II  give  information  on  the  rate  of  convergence  of  the 
iterative  method.  The  choice  of  boundary  data  has  no  significant  influence  on  the  performance.  In 
these  runs,  random  boundary  data  are  used. 

We  give  the  number  of  iterations  required  for 

\p,-Syj\^<t\p,\^,  (4.35) 

cf.  eq.  (4.19),  and  also  determine  the  condition  number  k  by  using  the  coefficients  generated  by  the 
conjugate  gradient  method;  see  e.g.  O'Leary  and  Widlund  (1979). 
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5.  Using  an  inexact  solver  on  the  rectangle 

5.1  A  different  view  of  the  domain  decomposition  algorithm 

The  iteration  described  in  Section  4  requires  an  application  of  (5''")"^,  and  thus  a  call  to  a  fast 
Laplace  solver  on  Q'^^  in  every  step.  It  is  not  practical  to  replace  this  solver  by  an  approximate  solver 
without  making  other  changes  in  the  algorithm,  for  the  following  reason.  The  use  of  an  inexact  solver 
can  be  interpreted  as  replacing  (5'^')*  by  an  approximation.  It  is  easy  to  construct  an  inexact  solver 

-  (2)  -  (2) 

such  that  the  approximation  takes  the  form  (S    )*,  with  a  symmetric,  positive  semidefinite  matrix  5 
with  ker(5    )  =  span(0.  However,  the  resulting  domain  decomposition  method  requires,  in  each  step, 

-(2)  -(2) 

one  application  of  S-S  and  one  application  of  (S  )*,  or  alternatively,  one  application  of  5  and  one 
of  (5  )*.  This  is  not  practical  because  applications  of  S-S  or  S  would  require  the  exact  solution 
of  linear  systems  with  the  matrix  Af  22- 

The  following  modification  of  the  algorithm  of  Section  4.2  makes  the  use  of  an  inexact  solver 
possible.  We  first  reduce  the  right-hand  side  of  (4.1)  to  the  form 


0 

£2 

P3 


(5.1) 


This  is  accomplished  by  solving  a  problem  with  the  matrix  K^,  resulting  in  a  problem  of  the  form 


Kn     0    ^,3 

0      *^  22   '^  23 
K12    K23    ^33 


^1 

r 

0 

>2 

= 

£2 

[13 

£3 

(5.2) 


This  system  can  be  reduced  to 


f        - 

yj. 
yi 


(5.3) 


with 


R  := 


K22  f^23 

T         f  fT     lr—\ 


Eq.  (5.3)  is  solved  by  using  the  preconditioned  conjugate  gradient  method,  with  the  preconditioner 


(5.4) 


R  = 


K22    f^T2 


(5.5) 


14 


Once  the  solution  of  (5.3)  is  known,  the  complete  solution  of  (5.2)  is  obtained  by  solving 
This  concludes  the  description  of  the  modified  algorithm. 


(5.6) 


By  replacing  Lemma  1  by  Lemma  3,  the  convergence  analysis  for  the  modified  algorithm  can 
be  completed. 
Lemma  3:  The  bound  C  of  Lemma  1  is  also  a  bound  on 


l2 

T 
R 

£2 

M 

.^l 

TX\3X  - 
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£2 

R 
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£2 

M 

M 

(5.7) 


For  the  design  of  algorithms  using  inexact  solvers  on  both  subregions,  see  Bramble,  Pasciak 
and  Schatz  (1986),  Widlund  (1987b,c). 


5.2  Multigrid  cycles  with  symmetric  iteration  matrices 

In  Section  5.3,  the  fast  solver  on  OP^  will  be  replaced  by  a  multigrid  cycle.  We  will  require  the 
iteration  matrix  of  the  multigrid  cycle  to  be  symmetric.  This  can  be  accomplished  as  follows. 

Consider  a  system  of  n  linear  equations 

Ax  =  b  (5.8) 

with  A=A^>Q.  We  first  assume  that  A  is  invertible,  and  discuss  the  case  of  a  non-trivial  kernel  later. 

We  first  recall  the  definition  of  the  well-known  two-level  correction  cycle;  see  e.g.  Stiiben  and 
Trottenberg  (1982).  Let  m  be  an  integer;  m  is  usually  smaller  than  the  number  n  of  equations  in  the 
system  to  be  solved.  Let  BbR'"'^  be  a  symmetric  matrix,  which  should  be  chosen  as  an 
m -dimensional  approximation  to  .4 "' .  Let  /?  be  a  linear  mapping  Z?"-^/?".  A  two-level  correction 
step  is  defined  by 


x^  :=  Xou+R  '^BR  (b-Ax^u) 


(5.9) 
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The  complete  two-level  cycle  is  obtained  by  supplementing  (5.9)  with  a  relaxation  method.  We 
consider  Gauss-Seidel  block  relaxation  here.  Let  /  be  any  subset  of  {\,—,n}.  Given  an  approximation 
X  for  A'^b,  one  can  determine  beR"  with  5,=0  for  HI,  and  such  that  the  y-th  component  of 
b-A(x+5)  is  zero  for  all  je  I.  We  then  say  that  x+5  is  obtained  from  x  by  the  relaxation  of  the  block  /. 
Lei/j,  /2.  ■.  /r  be  subsets  of  {\,--,n},  the  union  of  which  is  {\,—,n}. 

Two-level  correction  cycle: 

Choose Xq  and  generate  x-^^xi,---  as  follows. 

x,+i  is  obtained  from  £,  by  relaxing  over  the  blocks  I  ),l2,-Jr,  carrying  out  one  step  of  (5.9), 
and  relaxing  over  the  blocks /rJr-i,"'./i- 

The  iteration  (5.9)  and  each  individual  block  relaxation  step  have  symmetric  iteration  matrices 
with  respect  to  the  energy  product,  i.e.  the  inner  product  defined  by  A.  We  can  therefore  conclude: 

Lemma  4:  The  iteration  matrix  M  of  the  two-level  correction  cycle  is  symmetric  with  respect  to  the 
energy  product. 

A  mapping  G  is  defined  as  follows.  For  beR",  Gb  is  the  result  of  a  two-level  correction  cycle 
for  the  problem 

Ax  =  b,  (5.10) 

starting  with  the  initial  guess  x=0.  Denoting 

G:^A-\  (5.11) 

we  obtain 

G  =  {I-M)G.  (5.12) 

Lemma  5:  If  A/  is  symmetric  in  the  energy  inner  product,  and  if  p(A/)<l,  then  G  is  symmetric  in  the 
euclidean  inner  product  and  positive  definite,  and 

cond(G-"^CG-"^).i±e^.  (5.13) 

l-p(Af) 

Proof:  Since  M  is  symmetric  i/i  the  energy  product,  MG  and  thus  G  are  symmetric  in  the  euclidean 
inner  product.   Next  we   show   that  G   is  positive  definite  if  p(A/)<l.    G   is  congruent  with 
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G  (l-M)G  ,  which  is  similar  to  J-M.  Since  M  is  symmetric  in  the  energy  inner  product,  M  has 
only  real  eigenvalues,  and  if  p(Af)<l,  I-M  has  only  positive  real  eigenvalues.  Using  Sylvester's 
theorem,  it  follows  that  G  has  only  positive  eigenvalues.  We  now  estimate  cond(C       GG       ): 

cond(C""^GG""^)  =  cond(C-"^GG-"2)  =  cond(G-"2(/-M)G"2)  (5.14) 

^  K^x(I-M)  _  l-X^(M)  ^  i+p(M) 

"  K^U-M)  '  \-K..m)  ~  i-p(A/)  ■  ^  ■    '* 

This  concludes  the  proof  of  Lemma  5.  □ 

Since  we  are  interested  in  multigrid  cycles  for  Neumann  problems,  we  will  now  study  the  case 
of  a  constant  null  vector.  We  consider 

Ax  =  b  (5.16) 

and  assume  that  b^R"  satisfies  the  compatibility  condition  ^1=0.  (5.9)  remains  unchanged.  If  /  is  a 

proper  subset  of  {\\—\n},  then  the  corresponding  principal  minor  of  A  is  invertible  and  therefore  the 

relaxation  of  the  block  /  is  well-defined.  The  definition  of  the  two-level  cycle  remains  unchanged. 

The  solution  is  defined  only  modulo  constants,  and  therefore  the  iteration  matrix  M  is  defined  as  a 

mapping  of  the  orthogonal  complement  of  span(r)  into  itself. 

For  b^R"  with  ^1  =  0,  let  x"'  be  the  result  of  a  two-level  cycle  for  i4j=ft,  with  the  initial  guess 
x^'"=0,  and  let  Gb  be  the  projection  of  x''^  onto  the  wthogonal  complement  of  span(l). 

Lemma  6:  G  is  symmetric  with  respect  to  the  euclidean  inner  product,  and  a  positive  definite  map- 
ping  of  the  orthogonal  complement  of  span(r)  onto  itself.  Denoting  this  restriction  by  G  ,  we  have: 


cond 


(G-)-'-G-(gVH^T^.  (5-17) 

where  G*  is  the  restriction  of  G  to  the  orthogonal  complement  of  span(2). 

The  proof  is  similar  to  that  of  Lemma  5. 

The  discussion  of  this  section  has  been  restricted  to  two-level  cycles.  However,  it  is  easy  to 
obtain  analogous  statements  for  arbitrarily  many  levels,  using,  e.g.,  V-  or  W-cycles;  see  e.g.  Stiiben 
and  Trottenberg  (1982)  for  the  definition  of  these  cycles. 
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53  The  domain  decomposition  algorithm  using  a  multigrid  cycle  as  an  inexact  solver 

We  have  carried  out  numerical  experiments  with  the  modified  domain  decomposition  algorithm 
described  in  Section  5.1,  where  the  exact  fast  solver  is  replaced  by  a  multigrid  V-cycle.  The  ratio  of 

the  mesh  widths  of  consecutive  levels  is  — .  Piece  wise  bilinear  interpolation  is  used  to  transfer  the 

corrections  from  a  given  level  to  the  next  finer  level,  and  residuals  are  transferred  from  a  given  level 
to  the  next  coarser  level  by  using  the  adjoint  of  the  bilinear  interpolation  operator.  The  relaxation 
method  is  red-black  Gauss-Seidel  iteration.  Three  half  sweeps  (red-black-red)  are  carried  out  before 
and  after  each  coarse  grid  correction  step. 

We  find  numerically  that  the  spectral  radius  of  the  iteration  matrix  of  this  cycle  is  not  larger 
than  0.18  on  /I -(-1  by  n-i-1  meshes,  where  n  is  a  power  of  2.  By  rearranging  the  relaxation  sweeps  in  a 
non-symmetric  way,  we  could  obtain  smaller  multigrid  convergence  factors  for  the  same  amount  of 
work.  We  have,  however,  used  the  symmetric  arrangement  since  it  is  needed  for  the  analysis  of  the 
method. 

By  using  the  framework  of  Sections  5.1  and  5.2,  we  can  interpret  this  method  as  a  precondi- 

tioned  conjugate  gradient  method  for  eq.  (5.3).  using  a  symmetric,  positive  definite  preconditioner  R 

—  • 
with  ker(/f  )  =  span(l_),  and  with 
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Here  C  is  the  constant  defined  in  Lemma  1. 

Table  III  repeats  the  tests  of  Table  I,  using  the  method  described  above.  The  numbers  of  itera- 
tions in  Table  III  are  somewhat  larger  than  those  in  Table  I,  but  the  method  is  substantially  faster 
because  the  multigrid  cycle  is  less  expensive  than  a  direct  solver. 
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6.  Further  directions 

In  this  section,  we  will  discuss  the  possibility  of  using  faster  solvers  on  the  rectangle  and  mesh 
refinement  in  neighborhoods  of  the  re-entrant  comers. 

We  begin  by  noticmg  that  during  the  iteration,  the  solution  of  the  discrete  elliptic  problem  on 
the  rectangle  is  only  required  on  and  next  to  the  interfaces,  i.e.  at  relatively  few  points.  Similarly,  the 
right  hand  side  of  the  equation  differs  from  zero  only  on  the  interfaces.  Problems  of  this  kind  were 
considered  by  Banegas  (1978),  and  more  recendy  by  several  Soviet  scientists;  cf.  e.g.  Bakhvalov  and 
Orekhov  (1982),  Kuznetsov  and  Matsokin  (1982).  When  solving  the  discrete  Poisson  problem  on  an  n 
by  n  mesh  using  Bakhvalov's  method,  only  on  the  order  of  nlog(n)^  operations  are  required  to  obtain 
the  solution  on  one  mesh  line  within  a  prescribed  accuracy.  When  using  this  method,  the  cost  of  solv- 
ing our  problem  on  the  entire  region  to  prescribed  accuracy  would  asymptotically,  i.e.  for  sufficienUy 
fine  meshes,  be  dominated  by  the  cost  of  the  final  step  in  which  the  solution  is  obtained  at  all  mesh 
points. 

The  problem  of  the  lack  of  accuracy  caused  by  the  re-entrant  comers  could  be  addressed  sys- 
tematically, using  tools  similar  to  those  of  this  paper.  In  the  work  of  Bai  and  Brandt  (1987),  Bramble, 
Ewing,  Pasciak  and  Schatz  (1987),  Hart  and  McComnick  (1987),  McCormick  (1984),  and  McCormick 
and  Thomas  (1986),  the  questions  of  designing  accurate  ^proximations  on  locally  refined  meshes 
and  the  effective  solution  of  the  resulting  linear  systems  of  equations  are  addressed.  Some  of  these 
algorithms  are  multigrid  methods,  while  others  require  exact  solvers  for  the  unrefined  problems  as 
well  as  the  refined  models  on  the  patches  of  refinement.  It  has  been  demonstrated  that  certain  of  these 
methods  work  quite  well  for  a  considerable  number  of  refinement  levels.  We  note  that  the  relation- 
ship between  domain  decomposition  and  iterative  refinement  methods  will  be  explored  in  a  forthcom- 
ing report;  see  Widlund  (1987b). 

What  comes  to  mind  for  our  current  problem,  would  be  to  design  a  refinement  of  the  mesh 
locally  around  the  re-entrant  comers,  possibly  using  several  successive  refinements.  If  the  refined 
patches  of  the  rectangle  were  logically  rectangular,  then  iterative  methods  could  be  designed  using 
only  fast  Poisson  solvers,  as  in  the  algorithm  of  Section  4.  We  might  prefer  to  handle  the  inlets  by 
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Gaussian  elimination,  while  the  finite  element  problems  on  the  locally  refined  meshes  of  ti'^'  could  be 
solved  inexactly,  e.g.  by  one  step  of  an  iterative  refinement  algorithm;  cf.  Hart  and  McCormick 
(1987)  and  Widlund  (1987c).  A  suitable  domain  decomposition  algorithm  would  be  used  as  an  outer 
iteration.  Such  an  algorithm  would  be  similar  to  the  one  outlined  in  Section  5.  It  is  also  possible  to  use 
domain  decomposition  on  the  different  mesh  levels  separately  and  an  iterative  refinement  method  as 
an  outer  iteration. 
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Table  I.  Expenments  for  the  region  in  Fig.  1.  The  size  of  the  mesh,  including  the  mesh  points  on  the 
boundary,  is  33)i  by  33m.  in  ^^  rectangle  fi'^\  and  3^  by  9u  in  each  of  the  inlets. 
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Table  11.  Experiments  for  the  region  in  Fig.  4.  The  size  of  the  mesh,  including  the  mesh  points  on  the 
boundary,  is  33^1  by  33^  in  the  rectangle  ti'^',  and  3p.  by  9\i  in  each  of  the  inlets. 
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Table  ni.  Experiments  for  the  region  in  Fig.  1,  using  the  method  of  Section  5.  The  size  of  the  mesh, 
including  the  mesh  points  on  the  boundary,  is  33|i  by  33n  in  the  rectangle  Q'^\  and  3n  by  9)1  in  each 
of  the  inlets. 
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List  of  legends  for  the  figures 

Figure  1:  The  region  Q  =  fi"^  ^  ^"'  ■  ^'"  is  the  union  of  the  small  pieces,  and  Q^'  is  the  rectangle. 

Figure  2:  Solid:  /,  and  its  triangulation.  Dashed:  R,  and  its  triangulation. 

Figure  3:  Ordering  of  the  unknowns  in  the  inlets. 

Figure  4:  A  domain  satisfying  assumptions  1  and  3  in  the  proof  of  Lemma  2. 
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